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Summary 


The  solution  to  the  missing  value  equation  in  designed  experiments  with 
general  covariance  structure  is  shown  to  be  identical  to  the  "best  predictor 
of  the  missing  data  based  on  the  observed  data. 
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1.  Introduction 


Between  1952  and  1957  a  number  of  papers  appeared  (Snedecor  and 
Williams  (1952,1953),  Nelder  (1954),  Tukey  (1954),  Norton  (1955)  and 
Fairfield  Smith  (1957))  discussing  the  role  and  meaning  of  estimates  of 
missing  yields  in  designed  experiments.  These  followed  the  answer  by 
Snedecor  to  a  query  concerning  a  negative  value  obtained  as  replacement 
for  a  missing  observation  on  the  number  of  flies  caught  with  different 
types  of  baits.  It  was  stated  that  although  the  negative  value  obtained 
was  the  solution  of  the  missing  value  equation,  and  that  the  usual  analysis 
performed  using  the  completed  set  of  data  led  to  the  right  estimates  of  the 
effects,  the  value  was  not  meant  to  "estimate"  the  missing  yield.  The 
appearance  of  an  impossible  value  for  the  replacement  was  to  be  considered 
more  as  evidence  that  the  data  did  not  conform  to  the  model  assumed  than  as 
a  defect  in  the  missing  value  procedure.  In  Snedecor's  example,  a  trans¬ 
formation  to  the  logarithm  of  the  data  turned  out  to  be  more  appropriate. 

Further  light  was  thrown  on  this  problem  by  Fairfield  Smith  who  noted 
that  one  can  indeed  regard  the  replacement  value  as  an  estimate  -  either  of 
the  actual  missing  yield  or  of  its  expected  value  under  the  model.  His 
main  point  was  that  the  variance  one  ascribes  to  the  estimate  depends  on 
what  it  is  regarded  as  estimating,  being  larger  when  the  yield  itself  is 
being  estimated  as  the  lost  value  was  clearly  not  identical  to  its  expectation 
but  deviated  from  it  by  a  random  error  whose  variance  must  be  added  to  that 
of  the  estimator  of  the  mean. 

The  preceding  discussion  took  place  entirely  in  the  context  of 
uncorrelated  observations,  problems  concerning  incomplete  data  under  models 
with  more  complex  error  structure  (e.g.  split-plots,  BIBD's)  having  generally 
received  little  attention.  Early  references  in  this  area  include  Anderson 
(1946)  who  gave  estimates  based  on  minimizing  the  subplot  error  sum  of  squares 
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in  a  split-plot  experiment,  and  a  series  of  papers  by  Cornish  (1943,  1944, 
1956)  dealing  with  the  recovery  of  interblock  information  with  incomplete 
data  for  a  variety  of  block  designs.  Contributions  framed  within  analysis 
of  covariance  can  also  be  found  in  Coons  (1957)  and  Truitt  and  Fairfield 
Smith  (1956). 

In  this  paper  we  discuss  missing  value  estimates  in  models  with 
general  error  structure  and  show  what  is  almost  obvious  with  uncorrelated 
observations,  that  the  "best"  predictor  of  the  missing  yield  coincides  with 
the  correct  replacement  value.  Our  analysis  combines  results  from  a  recent 
paper  (Houtman  and  Speed,  1979)  examining  missing  value  problems  in  models 
with  general  error  structure  with  ones  from  Houtman  (1979)  concerned 
with  best  linear  unbiased  prediction,  and  the  notation  and  terminology 
of  this  second  paper  (referred  to  as  [HD will  be  used  in  what  follows. 


Best  linear  unbiased  prediction 

The  problem  of  prediction  of  one  random  variable  based  on  others 
has  received  a  lot  of  attention  in  the  time  series  literature  and  also,  but 


to  a  lesser  extent,  within  the  standard  linear  model.  This  work  goes  back 


at  least  to  Henderson  (1963),  more  recent  references  being  G.S.  Watson 
(1972),  Searle  (1974),  Harville  (1976)  and  [H]  . 


The  n-dimensional  space  2  of  full  data  arrays  may  be  decomposed 
into  a  direct  sum  of  the  space  2^  of  observed  and  2^  of  unobserved  data , 
and  we  denote  by  and  the  projections  onto  2^  and  2^  orthogonal 

with  respect  to  the  inner  product  ^x,y^  *  x*y  •  As  in  LH3  write 


y  -  D^y  +  D2y  =  y1  +  y2  ,  y  f  2  . 
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If  we  suppose  our  full  data  satisfies 

Ey  ■  t  «  J7  ,  JT  a  subspace  of  3  , 

Var  y  -  V  ,  V  known,  positive-definite. 


(1) 


then  the  observed  data  has 

Eyx  -  Djt  3  t1  «  DjJ  =  , 

Var  yx  -  DjVDj^  ; 

the  unobserved  data  y^  satisfies 

®Y2  “  D2T  S  T2  €  °2*^  5  ^2  » 

Var  y2  -  D2VD2  ; 

and 

cov  (yx,y2)  -  0^2  . 


(2) 


(3) 


A  best  linear  unbiased  predictor  (BLUF)  of  y2  based  on  y^  is  an 
array  y2  -  Ay^  where  A  is  a  linear  transformation  on  3  such  that 

At^  ■  t2  Vt  €  V 

and 

min  EllAy  -  y,||2 
At 

AT1  t2 

is  attained  at  A  ■  A  .  The  solution  is  unique  whenever  dim  ■  dimU  - 
this  will  be  assumed  to  be  the  case  in  the  sequel  -  and  can  be  written 


. _  A 
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y 2  “  T 2  +  B(y1  -  x^  (A) 

A# 

where  T2  is  the  best  linear  unbiased  estimator  (BLUE)  of  x^  based  on 
y^  ,  B  is  the  product  of  the  covariance  ^2^1  w^t^1  an  effective  Inverse 
of  D^VD^  ,  and  y^  -  x^  is  the  residual  after  fitting  of  the  observed 
model.  At  this  stage  we  can  observe  that  if  y^  and  y ^  are  uncorrelated, 
then  the  BLUP  of  y ^  is  identical  to  the  BLUE  of  the  expected  value  of  y2  . 


3.  The  missing  value  equations 

It  was  suggested  by  R.A.  Fisher  (see  Yates,  1933)  that  replacements 
for  missing  yields  in  a  designed  experiment  can  be  obtained  by  minimizing 
the  residual  sum  of  squares  when  unknowns  are  substituted  for  them  and  the 
validity  of  this  process  was  shown  in  Yates  (1933) . 


Using  the  notations  introduced  for  the  prediction  problem,  let 
y^  e  2^  denote  the  observed  yields  and  y^  e  2^  the  missing  ones, 
expectations  and  covariances  continuing  to  be  given  by  (1),  (2)  and  (3). 
Still  following  the  idea  of  Fisher,  let  us  fit  the  model  J7  to  y^  +  y*  , 
where  y*  denotes  a  set  of  parameters  replacing  the  lost  yields.  The 
fitted  value  x  is  then  the  weighted  projection  of  y^  +  y*  onto  V  : 

T  -  ^(yx  +  y$)  ,  (5) 

where  p|  is  used  to  denote  the  weighted  projection  onto  a  subspace  A  of 
2  ,  orthogonal  with  respect  to  the  inner  product  ^x,y^  ^  *  x*$  ^y  . 

The  missing  values  estimates  are  then  obtained  by  minimizing 


,lyl  +  y2  '  T,IV 


over  2 ^ 
<  •»•>  v 


2 

,  where  ll*llv  is  the  norm  associated  with  the  inner  product 
described  above.  By  least  squares  theory,  the  solution  is  given  by 


yj  -  42(!  *  *i> 


(6) 


where  x  is  given  by  (5) .  On  substicuting  (5)  into  (6)  we  conclude 
that  Che  solution  y*  must  satisfy  the  equation 


.[**  +  *2>  -  yll  • 


(7) 


The  solution  to  (7)  is  unique  whenever  t7  n  2^  ■  {0}  ,  i.e.  whenever 
din!7^  *  dim!7  (see  [H]) .  Equivalently  ||y^  +  -  *11^  can  be  minimized 

over  first  and  over  V  next,  leading  to  equations  (6)  and  (5).  Now 


by  substituting  (6)  into  (5)  we  obtain  an  equation  for  the  fitted  values: 

(8) 


$\ji +  -  yj  • 


The  following  result,  whose  proof  can  be  found  in  Houtman  and  Speed, 
shows  that  the  fitted  values  obtained  from  the  completed  set  of  data  give 
the  correct  fit  for  the  observed  model: 

“l^l 

Theorem:  The  BLUE  E ej  y^  of  based  on  y^  coincides  with  D^x 

where  t  is  a  solution  of  (8) .  Equivalently,  using  (6) 


DlVDX  V 

y1  -  Di^7(yi +  yV 


where  y£  satisfies  equation  (7). 


Missing  value  estimate  and  BLUP  are  identical 

We  now  organize  the  formulae  from  the  preceding  two  sections  to 
provide  proof  for  our  main  assertion,  namely  that  with  V  known  up  to  a 
scalar,  the  BLUP  and  the  missing  value  estimate  coincide. 

Let  M  denote  the  linear  operator  on  2)^  such  that 

D^Mz)  -  z,  z  €  ;  Mu  -  0  ,  u  €  2^  Q  17^  . 
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D  DV 
»  1  1 


Then  If  x^  *  Vej  y^  is  the  BLUE  of  x^  based  on  y^  ,  Mx^  »  t  is  the 
BLUE  of  t  based  on  y1  and  D^x^  -  x2  is  the  BLUE  of  x2  based  on 
71  •  If  (D^VD^)  denotes  an  effective  inverse  of  D^VD^  *  then  y2  has 
representation 


A™! 


D1VD1, 


7 2  "  y1  +  C°2V1)1>  C0!70!)  (I  -  )y1 

-  [D2  -  <W<W'][®jf\  -  *l]  • 

■  *5  Cl:  -  yxJ 

o  ^ 


where  t  is  such  that 


V  ■  Ti  • 


Using  the  theorem  of  section  3  it  follows  that 


x  -  T 


where  x  satisfies  (8)  and  hence, by  comparing  (11)  with  (6), we  conclude 
that  the  solution  to  the  missing  value  equation  is  exactly  the  best  linear 
unbiased  predictor  of  the  missing  observations  obtained  from  the  existing 
ones. 


This  conclusion  can  be  re-expressed  as  follows: 
the  problem  of  finding  y2  ■  Ay^  such  that 


EllAy^  -  y2l 


is  minimum  subject  to  Ax^^  -  x2  ,  Vx  e  U  ,  is  equivalent  to  that  of  finding 


y2  such  that 
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Wy1  +  y2  -  TllJ 

is  minimum  over  all  z  e  V  and  over  all  y^  e  3^  . 

We  close  with  two  remarks.  Firstly  it  is  clear  that  whenever  the 
procedures  just  discussed  are  applied  in  practice,  an  estimate  of  V  must 
be  used.  Ways  of  doing  this  are  explained  in  Houtman  and  Speed  (1979). 

And  finally  we  point  out  that  the  interpretation  of  solutions  of  missing 
value  equations  as  predictors  of  those  values  provides  a  strong  argument 
for  the  unsuitability  of  the  underlying  model  whenever  unreasonable 
replacements  arise. 
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